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Abstract 


Articles in Marketing and choice literatures have demonstrated the need for incorporating person-level 
heterogeneity into behavioral models (e.g., logit models for multiple binary outcomes as studied here). 
However, the logit likelihood extended with a population distribution of heterogeneity doesn’t yield closed- 
form inferences, and therefore numerical integration techniques are relied upon (e.g., MCMC methods). 

We present here an alternative, closed-form Bayesian inferences for the logit model, which we obtain by 
approximating the logit likelihood via a polynomial expansion, and then positing a distribution of het¬ 
erogeneity from a flexible family that is now conjugate and integrable. For problems where the response 
coefficients are independent, choosing the Gamma distribution leads to rapidly convergent closed-form 
expansions; if there are correlations among the coefficients one can still obtain rapidly convergent closed- 
form expansions by positing a distribution of heterogeneity from a Multivariate Gamma distribution. The 
solution then comes from the moment generating function of the Multivariate Gamma distribution or in 
general from the multivariate heterogeneity distribution assumed. 

Closed-form Bayesian inferences, derivatives (useful for elasticity calculations), population distribution pa¬ 
rameter estimates (useful for summarization) and starting values (useful for complicated algorithms) are 
hence directly available. Two simulation studies demonstrate the efhcacy of our approach. 
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INTRODUCTION 


Whether it’s the 20,000+ hits based on a www.google.com search or the 1000+ hits on www.jstor.org 
or the hundreds of published papers in a variety of disciplines from Marketing to Economics (Hausman 
and McFadden 1984) to Statistics (Albert and Chib, 1993) to Transportation (Bierlaire et. al, 1997), the 
logit model plays a very prominent role in many literatures as a basis for probabilistic inferences for binary 
outcome data. In part, this is due to the ubiquitous nature of binary outcome data, whether it is choices 
to buy in a given product category or not, the choice to go to a given medical provider or not, and the 
like; and, in part, it may be due to the link between random utility theory and the logit model in which 
binary choices following the logit model are the outcome of a rational economic maximization of latent 
utility with extreme value distributed errors (McFadden, 1974). 

One of the recent advances regarding this class of models, which has made its use even more widespread, 
is its ability to incorporate heterogeneity into the response coefficients, reflecting the fact that individu¬ 
als are likely to vary on the attribute coefficients that influence their choices (Rossi and Allenby, 1993; 
McCulloch and Rossi, 1994). Whether this heterogeneity is modelled in an hierarchical Bayesian fashion 
allowing for complete variation (Gelfand et. al, 1990), in a latent-class way allowing for discrete seg¬ 
ments (Kamakura and Russell, 1989), or by using a finite mixture approach (Train and McFadden, 2000), 
incorporating person-level heterogeneity is now the “expected” rather than the “exception”. 

Unfortunately, the added flexibility that heterogeneity allows comes with a price - numerical compu¬ 
tation and complexity. That is, once one combines the logit choice kernel, a Bernoulli random variable 
with logit link function, with a heterogeneity distribution, closed-form inference is unavailable due to the 
non-conjugacy of the product Bernoulli likelihood and the heterogeneity distribution (prior). Therefore, 
numerical methods such as quadrature, simulated maximum likelihood (Revelt and Train, 1998), or Markov 
chain Monte Carlo methods (Gelman et. al., 1995) are commonly employed to integrate over the hetero¬ 
geneity distribution and obtain inferences for the parameters that govern the heterogeneity distribution 
(the so-called, population level parameters). For instance, in the case of a Gaussian heterogeneity distri¬ 
bution this requires the marginal integration of the product Bernoulli logit likelihood with the Gaussian 
distribution, to obtain means, variances, and possibly covariances of the prior. While faster computing 
and specialized software has made this feasible, this research considers an alternative to these approaches, 
a “closed-form” solution. 

That is, in this research we consider a closed-form solution to the heterogeneous binary logit choice 
problem that involves approximating the product Bernoulli logit likelihood via a polynomial expansion (to 
any specified accuracy), and then specifying a rich and flexible class of heterogeneity distributions for the 
response coefficients (slopes). If the response coefficients within individuals are independent, we model 
them as arising from the Gamma distribution (albeit we demonstrate how are results can be obtained 
for any multivariate distribution) or, more generally, a mixture of Gamma distributions (McDonald and 
Butler, 1990); if the response coefficients are not independent we model them as arising from a Multivariate 
Gamma distribution, which allows correlations among the coefficients. We then integrate, now possible in 
closed-form, the approximated logit model with respect to these families. Once the model is integrated 
with respect to the heterogeneity distribution, we then can either: (a) maximize the marginal likelihood 
and obtain Maximum Marginal Likelihood (MML) estimates of the population parameters and utilize them 
for conditional inferences (the empirical Bayes approach: Morrison and Schmittlein, 1981; Morris, 1983; 
Schmittlein, Morrison, and Golumbo, 1987) or (b) in the case where the parameters of the heterogeneity 
distribution are set informatively based on prior information, historical data, subjective beliefs, or the like, 
fully Bayesian inferences are obtainable. 

In this manner, as in Bradlow, Hardie, and Fader (2002) for the negative-Binomial distribution, and 
in Everson and Bradlow (2002) for the beta-binomial distribution, one can effectively incorporate prior 
information and allow shrinkage that Bayesian models attend to, but can also obtain closed-form inferences 
without Monte Garlo simulation efforts or quadrature that can be sensitive to the starting values and/or 
contain significant simulation error. We demonstrate the efficacy of our approach using two simulated 
studies, therefore supporting its use as an alternative method. In addition, we also demonstrate that as 
a by-product of the method, closed-form derivatives of the marginal distribution are obtained which are 
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often of interest in that they inform how the distribution (possibly in particular the mean and variance) 
of population effects would change as a function of a change in the decision inputs (i.e. covariates). 
These derivatives are also commonly (and directly) used in the computation of probability elasticities, thus 
providing the opportunity for optimization decisions. 

The remainder of this paper is laid out as follows. In Section^ we systematically lay out the problem 
formulation by deriving the likelihood, which provides the basis of our polynomial expansion (an application 
of a geometric series expansion), as well as discuss the types of data for which our method is applicable. 
In particular, the results presented here (albeit they are generalizable) are most applicable (as we discuss 
in Section nj to product categories for which the binary response rate is either rare (e.g. durable goods 
purchases (Bayus, 1992) and mail catalog responses (Anderson and Simester (2001)), or those for which 
the frequency of purchase is high (e.g. orange juice). 

Section 12 is an in-depth analysis of the case when the response coefficients are drawn from independent 
Gamma distributions. Sections l2.1 l a,nd l2.2l conta,in our key integration results demonstrating the conjugacy 
of the approximation to the binary data likelihood and the Gamma family of distributions f Theorem 12.211 . 
Details of the integration lemma and plots of the robustness of the Gamma family and its generalizations 
that we consider are in Appendix We discuss computational issues related to our series expansion in 
Section O In Section ion details of the method to maximize the marginal likelihood are given, and in 
addition we provide computational efficiency gains and guidelines as to the number of calculations that 
will occur using our method. In particular, we initially obtain closed-form solutions involving infinite 
sums. Using combinatorial results on systems of equations with integer coefficients, we show in Theorem 
IQ how these sums may be re-grouped to a lengthy (to be discussed and evaluated via simulation) initial 
calculation independent of the parameter values, and then a fast parameter-specific computation which 
makes the entire approach tractable. Thus subsequent computations of the marginal likelihood at different 
parameter values (necessary for optimization) is rapid. Additional details of these combinatorial savings 
are provided in Appendix 10 In Section we provide some simulations to demonstrate the efficacy 

of our approach. In Section ITiOl we compare our closed form series expansion with previous numerical 
techniques used to analyze these types of Bayesian inference problems. In the case of one observation per 
household, our series expansions have a comparable run-time to Monte Carlo Markov Ghain methods (in 
fact, the series expansions are faster); however, for multiple observations per household these numerical 
methods are typically faster, though our series expansions can still be implemented in a reasonable amount 
of time. In this manner, our approach is an alternative, albeit for many practical problems not one that 
is faster, but rather one that can be used to verify other (e.g. MCMC) methods. Some areas for future 
research and limitations of our approach, in particular the extension of our findings to a more general class 
of priors ('Theorem l2.4ll . and a more general class of covariates, are described in Section lT^ We show that, 
at the cost of introducing new special functions, we can handle any one-sided probability distribution for 
the priors. 

In Section 12 we generalize the results of Section [2 to allow for covariances. In Section Tl. 1.11 we derive 
a closed-form series expansion for an arbitrary multivariate distribution; however, if the distribution has a 
good closed-form expression for its moment generating function, more can be done. We concentrate on the 
case where the response coefficients are drawn from a Multivariate Gamma distribution, which allows us to 
have correlations among the response coefficients. The key observation is Theorem ld.il where we interpret 
the resulting integrals as evaluations of the Moment Generating Function which exists in closed-form for 
the Multivariate Gamma distribution. Thus we may mirror the arguments from Section|2and again obtain 
a rapidly convergent series expansion iTheorem Id. dll , and the combinatorial results of Section EZH and 
Appendix [0 are still applicable. 

Section ^contains some concluding remarks. 
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1 PROBLEM FORMULATION 


As the logit model and its associated likelihood are well understood, we briefly describe them in Section 
o and focus mainly here (in Section n~a on the geometric series expansion of the model. If we assume 
the parameters are independent (all zero covariances), then a tractable model is obtained by assuming 
that each is drawn from a Gamma distribution. This is described in detail in Section [21 in Section O 
we generalize the model by assuming the parameters are drawn from a multivariate Gamma distribution, 
which allows us to handle covariances among the parameters. In both cases we obtain closed-form series 
expansions. Further, a careful analysis of the resulting combinatorics leads to computational gains that 
make the approach feasible and attractive. 


1.1 Notation 

To describe the model, and to be specihc about the data structures addressed (and not addressed) in this 
research, we utilize the following notation. The jargon is drawn from the Marketing domain and is done 
for purely explicative purposes. As we demonstrate, our approach is applicable for a wide class of general 
data structures. 

Gonsider a data set obtained from i G {!,...,/} households (units) containing j G {1,..., J} product 
categories (objects; e.g. coffee) measured on f G {!,..., A^^} purchase occasions (repeated measures). At 
each purchase occasion, for each category j each household i decides whether or not to purchase in that 
category. 

As is standard, we define 

J 1 if household i buys in category j at time t 
1 0 otherwise. 


where pijt = Pioh{yijt = 1) is the probability of purchase of the category by the i**' household on its 
purchase occasion. Further, let P denote a set of attributes describing the categories, with corresponding 
values Xijt^p such that ■ ■ ■ ,Xijt,p). To account for differences in base-level preferences for 

categories, we define Xijtp = I defining category-level intercepts. Thus, multiplying over all households, 
categories, and occasions, we obtain that the standard logit likelihood of the data, Y = (yijt), is given by 


P{Y\P) 


I .7 Ni 

nnn 


2=1 j = l 




I + ’ 


( 2 ) 


where f3i = (Ap,..., A.p) is the coefficient vector for the household with variable p specific coefficient. 
It is the heterogeneity across households i in their (3i^p that we model in Section (21 as coming from the 
Gamma family of distributions, and in Section O as coming from a multivariate family of distributions. 

The marginalization of the likelihood, which is the problem we address here, is that we want to “hit” 
P{Y\f3) (integrate with respect to) a set of distributions (?(/3i_p|fI) depending on parameters fl such that 


P{Y\n) = J P{Y\P)g{P\n)dP 


(3) 


is available in closed-form. To accomplish this, we require a properly chosen series expansion of P(Y\P), and 
we describe its basic building block next, an application of the geometric series expansion. The goal is to 
obtain a good closed-form expansion of the marginalization of the likelihood for each choice of parameters 
n. To do so requires finding an appropriate conjugate distribution leading to tractable integration; we 
shall see that the Gamma (Theorem 12.211 and Multivariate Gamma (Theorem l.'I.lll distributions lead to 
integrals which can be evaluated in closed-form. Using such expansions, we then determine the value of U 
which maximizes this likelihood; this will allow us to make inferences about the population heterogeneity 
distributions. 
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1.2 Geometric Series Expansion 

To obtain closed-form Bayesian inferences for the logit model, we expand the likelihood P(Y\(3) given in 
@ by using the geometric series expansion: 


1 


l-z 



(4) 


This is directly applicable for our problem as P{Y\f3) is the product of terms of the form Our interest 

will be in expanding the denominator when u < 0. Note terms with u > 0 can be handled by writing 


While in theory we can unify the two cases (positive and negative values of u) by using the sgn function 
(sgn(it) = 1 if M > 0, 0 if rt = 0 and —1 otherwise), the sgn function is only practical in the case of one 
attribute (i.e. x is one dimensional and P=l); otherwise it is undesirable (untenable) in the expansions. 
In high dimensions (lots of households with lots of categories and attributes), the sgn function leads 
to numerous, complicated subdivisions of the integration space. This greatly increases the difficulty in 
performing the integration and obtaining tractable closed-form expansions, and hence is not entertained 
here. Details of the unification are available from the authors upon request, and applying it in practice is 
an area for future research. Instead, we describe below a specific set of restrictions that we employ, and 
the class of problems (data sets) where our expansions can then directly be applied. In Section 6, areas for 
future research to generalize our work to richer data sets are discussed. 

As mentioned above, to eliminate the need for the sgn function and to allow for straightforward expan¬ 
sions, we limit our investigations to the common set of Marketing problems (as described in Section 11.11 
and throughout) in which all 

T ^ 0; 

2 . (3^,p > 0 , 

3- Ep=i l3i,pXijt,p > 0. 

From a practical perspective, these restrictions indicate as follows. First, each Xijt^p > 0 is not particularly 
restrictive, as commonly utilized descriptor variables - prices, dummy variables for feature and display, 
etc..., as in standard SCANPRO models (Wittink et al. 1988), are all non-negative and are straightfor¬ 
wardly accounted for in our framework. Those variables which take signs counter to previously signed 
variables can be coded as /(Wjt.p), for instance —Xijt,p or exp(—ATyt.p)- 

Secondly, restriction of (3i^p > 0 may or may not be restrictive. If the variables which comprise Xijt^p 
are ones in which we want to enforce monotonicity constraints (Allenby, Aurora, and Ginter, 1995), or 
naturally one would expect upward sloping demand at the category-level (which may be much more likely 
than at the brand level), then this constraint is not at all restrictive, and in fact may improve the predictions 
of the model. To implement this, especially in the case of dummy coded Wjt.p, the least preferred level 
should be coded as 0 so that all other corresponding dummy variables have Wjqp = 1 and it is expected 
that Pi^p > 0. 

Our third constraint, which is not restrictive as long one of the X’s (e.g. price, coupon, etc...) is 
non-zero, is required so that we are not expanding 4 = in a polynomial series; if this condition fails, 
trivial book-keeping suffices. 

There are two important things to note. First, if all the /3i,p’s are negative, we may explicitly factor 
out the negative sign of each jSi^p, yielding terms such as —Xijt^p\/3i^p\. If this is the case, for simplicity we 
change variables and let Pi^p = |/3i,p|; thus, Pi^p and Xijt^p are now both non-negative, and we have minus 
signs in the exponents above. Thus we do not need to assume all persons have positive or all persons have 
negative coefficients, but rather (by recoding X to —X) that each person’s coefficients are all positive or 
all negative. Secondly, in totality, the restrictions above suggest that our model works only for categories 
in which the probability of purchasing in that category on any given occasion is strictly greater than 4 (if 
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coded as before) or less than ^ (if coded as in this paragraph), where again this can very person-by-person. 
Certainly an area for future research would be the ability, if possible, to relax some of these assumptions, 
and to empirically investigate the set of product categories for which these restrictions are not particularly 
binding (such as long-lasting durable goods). 

Thus (after possibly recoding), due to our restrictions, we only need to use 0) when > 0, which 

yields 


1 




e '=0*^5*^% > 0. 

^ijt —0 


(5) 


This is combined, as described next, with the Gamma or Multivariate Gamma family of distributions in a 
conjugate way. It is the constancy of the sign of X^^^Pi that allows us to use the same series expansion for 


all Pi- 


1.3 Expansion of P{Y\(3) 

Using the likelihood for the logit model given in (2), we have as follows: 


P{Y\P) 


I .1 Ni _xT g-v, 

_ _______ L g 


nnn,^ ..w. 

■.U=n.il + e "• 

I J Ni I J Ni 

IP. . .r ^ .V, 

JJ" g~(E/=i Et=*i VijtXijt,pj0i,p _ 

i=lp=l i=lj=lt=l 

PiiY\p)P2iY\P). 


1 + 


I J Ni 


1 -k 




( 6 ) 


Note that the first term, Pi{Y\P), is already an exponential function. This combines nicely with the 
Gamma and Multivariate Gamma distributions, and for each variable /3i,p, we simply have the exponential 
of a multiple of Pi^p. In fact, as we show later in Theorem rm it is this exponential form that leads 
to the result that all closed-form integrals are obtainable using the Moment Generating Functions of the 
heterogeneity distribution. It is the second term, P 2 (Y\P), that we expand by using the geometric series. 
We describe this now. 

The real difhculty in coming up with a conjugate family to the logit model is in the expansion of 
P 2 (Y\P). Using the geometric series expansion, we obtain 


P2{y\p) 


1 J Ni 

nnn 


j=l 



I J Ni oo 

2=1 kijt^O 


I J Ni oo 



2—1 j—1 t—1 kijt—0 


(7) 


For fixed household i, replacing yields 


p 2{ y \ p ) = n 

2=1 


OO OO 

E ••• E 

, fciii=0 kijN =0 


(-1)^1 


zl Z^t 


p 

n< 

p=i 


- E/=iEr= 




■Pi,, 


( 8 ) 
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Our problem therefore reduces to finding a good expansion for the integral of Pi{Y\(i)P2{Y\(i)g{(3\^). We 
assume is given by a product of Gamma or Multivariate Gamma distributions (or their generaliza¬ 

tions as described below), rich families of probability densities defined for non-negative inputs j3i^p which 
for certain choices of parameters have good, closed-form expressions for integrals against exponentials. For 
other parameter distribution choices and other probability distributions, at the cost of introducing new 
special functions we still have closed-form series expansions for the integrals (as we discuss in Sect! on 
The reason the Gamma and Multivariate Gamma distributions lead to closed-form expansions is that both 
have a good closed-form expression for their moment generating function; in Theorem I.S.ll we generalize 
our results to any multivariate distribution with a good closed-form moment generating function. 


2 UNIVARIATE CASE: GENERALIZED GAMMA 


In this section we combine all of the pieces in the case when the j3i^p are independently drawn from 
Gamma distributions with parameters {bp,np) (independent of i): the logit likelihood given in (|2), the 
geometric series expansion in 0, and an integration lemma given in EJ below which allows us to obtain 
series expansions for P{Y\^1). Then in Section ri.ll.l I we discuss how to re-group the resulting series expan¬ 
sions for computational savings. In Section|31we consider the more general case of choosing the /3i,p’s from 
a Multivariate Gamma distribution, where now for a given i there may be correlations among the Pi^p^s. 
2.1 The Gamma Distribution and its Generalization 

As Pi^p is assumed greater than 0, one flexible distribution to draw the Pimp’s from is the three parameter 
Generalized Gamma distribution (McDonald and Butler, 1990). The Generalized Gamma distribution is 
extremely rich, and by appropriate choices of its parameters, many standard functions are obtainable. It 
is defined for z non-negative by 


GG(z; a, b, n) 


iur"-'e-(f) 

bT{n) \bJ 


(9) 


For example, the following assignments of the parameters a, b and n yield well known distributions: 
lima^o GG(z; a, 6, n) is lognormal; GG(z;a,6,1) is Weibull; GG(z;l,6, n) is Gamma; GG(z; 1,6,1) is Ex¬ 
ponential; GG(z; 2, 6,1) is Rayleigh. For fixed a and n, the effect of 6 is to re-scale the units of z. That 
is, as z only appears as 6 may be interpreted as fixing the scale (i.e. the commonly interpreted scale 
parameter); a and n change the general shape of the Generalized Gamma. Hence, as opposed to the more 
familiar Gamma family of distributions commonly used in Marketing problems, which we focus on here, 
the Generalized Gamma has a second shape parameter, a, allowing for more flexible shapes. 

We provide in Appendix ro some plots of the Gamma family of distributions for various parameter 
values, and of a mixture of Gamma distributions, an even more flexible class to demonstrate its flexibility in 
providing a rich yet parsimoniously parameterized set of priors for the Pi^p. Although the results reported 
directly in this paper correspond to the heterogeneity distribution following a single Gamma distribution, 
they straightforwardly extend to a mixture of Gammas, where the mixture is a weighted sum of component 
Gammas. For each component of the mixture we can integrate its expansion term by term, and hence the 
entire weighted sum. From a practical point of view, this allows us to handle the situation of latent class 
modelling, in which the Pi^p come from a latent segment, each of which has its own Gamma parameters. 

As the geometric series expansion of the logit likelihood, as described in Sections 11.11 and 11.21 will lead 
to terms involving exponential functions, our key integration result arises from integrating an exponential 
function against a Gamma distribution. For simplicity we consider only the case of a Gamma distribution 
(a = 1), and discuss its generalization below and in Section ITU 

This assumption allows us to not only obtain closed-form expansions, but these expansions will be 
rational functions of the arguments of the Gamma distribution (which allow us to obtain tractable closed- 
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form expansions for the derivatives as well). For notational convenience let G{z]b,n) = GG(z;l,b,n) 
denote a Gamma distribution with parameters b and n. 

Lemma 2.1 (Exponential against a Gamma distribution). Consider a Gamma distribution G{z; b, n). 
For d>0, 


e-^‘^G{z-,b,n) 


(l + 6d)-"G 


{‘'Tm 



( 10 ) 


As G{z; Yq^,n) is a probability distribution, we obtain 


e ^‘^G{z;b,n)dz = — - 

z=o (l + 6d)- 


( 11 ) 


See Appendix roi for a proof. This closed-form integration result allows us to avoid resorting to Monte 
Carlo or other numerical techniques to approximate the integral for P(F|n), and is our integration “engine”. 
Later in Theorem IQ of Section 131 we generalize T;emma f2.1l hv interpreting it as evaluating the moment 
generating function of the Gamma distribution at —d. 


2.2 Series Expansion for P(y|f2) for the Gamma Distribution 

We assume the response coefficients are drawn from the Gamma distribution. Summarizing our inference 
problem, we need to investigate the integral for P{Y\V,): 


POO POO J POO 

/ •■•/ P{Y\^)g{m)d(i = W 

Jo Jo 2=1"'0 

where 



p 

Pii{Y\j3)P2i{Y\j3) G{l3i^p]bp,np)dl3i^p, 


Pu{y\id) 

P2^{Y\P) 

9{.m) 


p 

(X]/=l H t=l 

p^l 

OO OO ^ / N \ 


kiii—0 kijN^—O 

p 

bp^ Pp)- 

p^i 


p=i 


( 12 ) 


(13) 


Because of the conditional independence across z, we can evaluate each integral in 113 separately. We 
denote each of the i-integrals above by (for the household), where 


where 


^CX> poo 


P OO 


H, = 


fo Jo 


HE-- E (- 1 )^- 

p—lkiii—O kijN.—O 


= 1 t=l 


p (E/=i E t=i yijtXtjt,p^ (E/=i E t=i ^ijtXijt,p^ /3; 


E--E<-i)“nx ' 

kiii=0 kiJN,=0 p=l'^Pi,p-'J 


G{Pi^p, bp, np^dPi^p 
-^i'^^'’^G{l3i^p; bp, np)dPi^p, 


(14) 


J Ni J Ni 

d^i,p — ^ ^ P ^ k ' \ ^ ^ ) kijt- 

j=l t=l j=l t=l 


( 15 ) 





Therefore 


POO POO ^ 

P{Y\n) = ••• / PiY\/3)gimdp = TTH- 

Jo Jo 

Applying the integration lemma fLemma l2.1ll to m yields 





G(A, 




o^dPi^p 


1 

(1 + bpK,,pf^ ■ 


By combining the expansion in o with 03, we obtain our final result for Hi: 


(16) 


(17) 


Theorem 2.2. Assume the Pi^p are independently drawn from Gamma distributions with parameters 
{bp, Up). Then P{Y\il,) = nf=i Hi, where 




oo oo P 

Y. ■■■ Y n 

feill=0 fciJN;=0 p=l 


1 

(1 + bpK.^pf^ ’ 


K,, 


J Ni 

+ kijt)xijt^p. (18) 

j=i t=i 


Hence the log marginal distribution, logL = logP(y|H) = log(Hi), can be computed as the sum of the 
logarithm of m- This yields the desired closed-form solution. 


2.3 Computational and Implementation Issues 

2.3.1 Computational Issues and Gains from Diophantine Analysis 

While Theorem l2.2l vields a closed-form expansion for the marginal posterior distribution when the response 
coefficients are independently drawn from Gamma distributions, to be useful we must be able to efhciently 
determine the optimal values of the parameters H. As written, the number of terms needed in the series 
expansions are computationally expensive/impossible (i.e. the upper bounds are at oo). If every sum 
ranged from 0 to R, to have good expansions R would have to be prohibitively large. In this section we 
describe a more efficient way to group the summands to significantly decrease computational time and 
maximize the marginal posterior which will make this more computationally tractable for the Marketing 
scientist. We also note that due to the high degree of non-linearity and the infinite series expansion, there 
does not exist a closed-form solution for the optimal parameter values, H, by simply solving the first-order 
condition equation ^ = 0. We therefore use numerical methods to obtain the maximum marginal a 

posteriori values. 

One common approach to determining the optimal values is to use a multivariate Newton’s method. 
Unfortunately, in many of the simulations investigated here, the flatness of the surface around the mode 
and the multi-modality of the marginal posterior led to poor convergence; however, we expect for larger 
(and different) data sets, Newton’s method may become feasible and hence we include the closed-form 
first, second, and cross derivatives in Appendix ini We also note that one reason for our choice of the 
Gamma distribution was that the resulting expansions (see Theorem are elementary functions of 
the parameters bp and Up, and hence have elementary closed-form expansions for their derivatives. This 
facilitates calculations of elasticities, shown to be crucial in determining optimal marketing strategies 
(Russell and Bolton, 1988). 

We therefore instead resorted to evaluating 03 in a grid over the parameter space, and then choosing 
the value that maximized the marginal posterior. That is, the beauty and value of our expansions is that it 
allows us to calculate H^ rapidly even for many grid points. However, this is assuming that we can truncate 
each of the summations at a computationally feasible value, an approach we now describe. 

As the expansion stands in 03, without careful thought, only moderate sizes for J and Ni are feasible. 
In any numerical calculation, the infinite sums must be truncated. For simplicity and for explicative 
purposes, assume each sum ranges from 0 to i? — 1. As there are JNi summations, we have a total of 
terms to evaluate. Additionally, we have a product over p S {1,...,P} attributes, and then a 
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product or sum over i S households. If we ssume all Ni = N for purposes of approximating the 

computational complexity, the number of computations required is therefore of order P R'^^' = PI R^^. 

As we want to determine the values for the parameters bp, np that maximize the integral given in (18), 
two common approaches, Newton’s Method or evaluating in a grid, can theoretically be done (especially as 
we have explicit formulas); however, the number of terms makes direct computation from this expansion 
(i.e. without computational savings as described below) impractical at present computing speeds. For each 
parameter, we need to calculate on the order of PI ■ R'^^ terms for just one iteration of Newton’s Method 
or evaluation of logL for a grid approach. We discuss a way to re-group the terms in the expansion which 
greatly reduces the computational time and allows us to handle larger triples {R, J,Ni). 

We show below in detail that what allows us to succeed is that it is possible to re-group the computations 
in such a way that we have a lengthy initial computation, whose results we store in a data file. From this, it 
is possible to evaluate the log-likelihood (or derivatives if using Newton’s method) at all points of interest 
extremely rapidly. The reason such a savings as described below is possible, in some sense, is that the 
computations factor into two components, and most of the computations are the same for all values of the 
parameters and hence only need to be done once. 

Consider the case of P attributes: p £ {1, 2,..., P}. We then have cc-vectors 

^i,p — (^^ill,p j ■ ■ ■ 5 XiJNi (^9) 

Assume all Xip^p are integers; this is not a terribly restrictive assumption^, and can be simply accomplished 
by changing the scale we use to measure the Xijt^p^s. The advantage of having integer X’s is that we now 
have Diophantine equations, and powerful techniques are available to count the number of solutions to 
such equations and hence “judge” the feasible values of (i?, J,Ni). 

For notational convenience let k = {km ,..., kijpf.), 1 = (!,...,!) and 




l,P 


= EE 

j i 

J Ni J Ni 

= EE(yb* + +EE kijtXijt,p . 

j=l t=l j=l t=l 

Recall from Hi that when the Pi^p are independently drawn from Gamma distributions that 

oo oo P 


1 


H. = i:... i: 


( 20 ) 


( 21 ) 


fcill=0 fciJAT. =0 


Fix ani G {1,...,/}. We see that ll^ depends weakly on fc; by it^ . all that matters are the dot products 
k ■ Xi^p and the parity of (—1)*'^. Let r = (ri,..., rp) be a P-tuple of non-negative integers. For each k we 
count the number of solutions to the system of Diophantine equations k ■ Xi^p = Vp {p G {1,..., P}) while 
recording the sign of (—l)2f j StExplicitly, we may re-write 11811 from Theorem 12.21 as 

Theorem 2.3. Set Yi^p = VijtXijt^p and 

S{M) = {z; : u = (^;l,...,^>M),^'^ e {0,1,2,3,...}} 

K,{x,r,+) = #{k G S{JNi) :\fpG {l,...,P},k-Xi^p = rp,{-l)^'^ = +1} 

Ki{x,r,-) = 4j^{kGS{JNi)-.\/pG{l,...,P},k-Xi^p = rp,{-Vf'^ = -l}. (22) 


Assume 

p{Y\n) 


the Pi^p are independently drawn from Gamma distributions with parameters {bp,np). 
= ritLi Hi with 


H 


E 

reS(P) 


Ki{x,r,+) - Ki{x,r, -) 
np(l A bpYi^p + 6prp)"p 


Then 


(23) 


^This is not restrictive even for an attribute like price. Many studies are done with a discrete set of integer prices and in 
other cases, even if there were a fairly moderate number, the model can handle it albeit with increased computation. 
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There is a large computational startup cost in solving but future computations are significantly 
faster. We calculate Ki{x, r, ±) once, and store the results in a data file. Then, in subsequent calculations, 
we need only input the new values for bp and Up (or even better calculate the values for multiple bp and 
Up simultaneously if we are evaluating over a grid). The advantage of such an expansion is that successive 
terms involving larger r decay with k ■ Xi^p. Thus, we do not want to truncate the sum j ''' Sfc p 
having each sum range from 0 to i? — 1; instead, we want to consider the A:-tuples where the dot products 
are small, as the fc-dependence is weak (the expansion depends only on the value of k ■ Xi^p). From a 
computational point of view, there is enormous savings in such grouping. 

To determine how many terms are needed for this truncation to be a good approximation to the infinite 
expansion requires an analysis of Ki{x, r, +) — Ki{x, r, —). We sketch some straightforward, general bounds 
in Appendix [cl We do not exploit the gain from the factors of (1 + bpYi^p + bprp)~'^p so that our bounds 
will apply to the more general cases that we consider later (explicitly, the multivariate distributions with 
good closed-form moment generating functions of Section 13. One other point to note and which will 
greatly improve the convergence of the truncated expansions is to introduce translations in the Gamma 
distributions, which will give exponentially convergent factors. Assume each fii^p > e; for e small (e.g. 
0.0001), from a practical point of view such an assumption is harmless as a coefficient restricted to this 
range is not practically different than one restricted to be greater than or equal to 0. Explicitly, we draw 
Pi^p from G{z — e; bp, Up) rather than G{z-, bp, Up). Similar arguments as before yield 


H,: = 


n 


reS{P) p=l 


, {Ki{x, r, -I-) - K^{x, r, -))e~ 
(1 -I- bpYi^p + bpTpYp 


(24) 


As Ki{x, r, -b) — Ki{x, r, —) grows at most polynomially (see Theorem IG.dl in Appendix B), it is clear the 
above expansion converges (and for reasonable values, it will converge more rapidly than when e = 0). 


2.3.2 Numerical Simulations 

To demonstrate the efficacy of our approach given in we ran a series of numerical simulations. The 
results reported here are from two sets of the many simulations conducted, the remainder of which are 
available upon request. The first simulation design was chosen to be computational feasible; however, 
without loss of generality it contains all the elements that are required to generalize our results. In some 
sense, due to its maximal sparseness in information, it is the most strict test of our approach. 

Specifically, we report here first on a series of simulations with the following design: 

• P = 1, one attribute per observation, 

• JNi = 1, one brand and one observation per household, 

• / = 1000, one thousand households, 

• 0 < ki < R with R (the number of polynomial expansion terms) equal to 100, 

• a = 1 (the Gamma distribution) for various choices of b and n, and 

• untranslated Gamma distribution (i.e. e = 0 in CH)). 

All simulations were run using Matlab on a 1.9 GHZ athlon processor with 192 Mb of RAM, a very modest 
computing machine in today’s standards. 

For each bi and ni pair, 25 simulates were run by: (i) choosing I = 1000 values of from a 

G{z;bi,ni). The values of Xijt were selected from the values (1,2,3) with equal probability, and then 
arbitrarily scaled by a constant c to make the values of f3i^p ■ Xijt reasonable so as to allow for enough 0/1 
variation in the Uijt- Then for each of the 25 simulates, we numerically approximated P(Y\n) as given by 
HD and then maximized the resulting marginal likelihood, as a function of H, using a grid of values. In 
particular, we utilized a grid size of dimension 5x7 centered at (bi, ni) with spacings of .1 (this was reached 
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after considerable empirical testing to ensure enough fineness and that the solutions were not occurring on 
the boundary of the grid). 

We summarize our results in the table below: the true values of bi and ni, the mean and standard 
deviation over 25 replicates of the estimated values, and the t-statistics for both b and n. 


(61, m) 

(61, nT) 

(Trii) 

t-stat (61) 

t-stat (m) 

(5, 14) 

(5.21, 14.86) 

(1.40, 3.36) 

0.76 

1.29 

(10, 28) 

(10.72, 26.38) 

(1.46, 3.17) 

2.46 

-2.55 

(9, 9) 

(9.06, 9.64) 

(2.41, 2.37) 

0.12 

1.36 

(18, 18) 

(17.39, 18.62) 

(2.28, 2.40) 

-1.34 

1.30 

(11.5, 6.5) 

(10.65, 7.38) 

(2.46, 2.03) 

-1.73 

2.16 

(23, 13) 

(23.93, 12.63) 

(2.35, 1.43) 

1.97 

-1.30 


To assess whether the simulate values are in accordance with the true values, we conducted t-tests 
for each of the parameters and simulated conditions. This resulted in 12 significance tests, all of which 
correspond to a t-distribution with 24 degrees of freedom (note we did 25 simulates). Using the common, 
albeit conservative, Bonferroni adjustment method for multiple comparisons, we note the critical value 
of 3.167 in absolute value of which none of the comparisons is close (the corresponding value for one 
comparison is 2.064, which 9 of the 12 are less than). This suggests a very adequate fit of our approach 
and therefore the size of R in our polynomial expansions. Other simulations, not shown, suggested higher 
values of R provided even greater accuracy. 

The six set of simulations were chosen to be indicative of three possible settings, bi > ni, 6i = ni and 
bi < ni- We then replicated these three settings by scaling each of the values of b and n by a factor 
of 2. In this way we are able to show that it is not a particular ordering of bi and rii that matters 
nor the relative sizes of them. Note again, as above, that this simulation test of our approach is ultra¬ 
conservative in that we have tested our method using J-W = 1 and I = 1000, modest values. That is, with 
simply one observation per household and 1000 households, our approach is accurately able to reconstruct 
the heterogeneity distribution from which the f3i^p were derived. This result was also not dependent on 
7=1000, as shown below, and hence would have led to even faster processing time. Our belief is that this 
is a strong signal of the efficacy of our approach. 

A second series of simulations with more general conditions was conducted in which the number of 
attributes was increased to P = 2. The purpose was to see how well multiple Gamma distributions could 
be detected. There are now four parameters (6i, ui, 62, ?^2)- To have these simulations run in a comparable 
time as the previous, we chose I = 250, 7? = 40, a grid of size 4 x 4 x 4 x 4 centered at (5i, ni, 62,712) with 
a grid spacing of .5 units, and 10 simulates for each condition. We summarize the results below. 


( 61 , 711 , 62 , 712 ) 

( 6 i,rrr, 62 ,77^) 

((7^1, (Tni , (T 52 7 ^712 ) 

t-stat 

( 61 ) 

t-stat 

(" 1 ) 

t-stat 

( 62 ) 

t-stat 

(9, 9, 18, 18) 

(8.60, 8.75, 17.9, 18.1) 

(2.12, 2.20, 1.96, 2.25) 

-.60 

-.36 

-.16 

.14 

(11.5, 6.5, 23, 13) 

(11.3, 6.80, 22.3, 12.8) 

(1.77, 1.95, 2.08, 1.96) 

-.36 

.49 

-1.14 

-.40 

(5, 14, 23, 13) 

(4.85, 13.9, 24.15 14.05) 

(1.56, 1.76, 1.55, 1.28) 

-.30 

-.18 

2.35 

2.60 


This resulted in 12 significance tests, all which correspond to a t-distribution with 9 degrees of freedom 
(note we did 10 simulates). Using the common, albeit conservative, Bonferroni adjustment method for 
multiple comparisons, we note the critical value of 3.81 in absolute value of which none of the comparisons 
is close (the corresponding value for one comparison is 2.26, which ten of the twelve values are less than). 
This suggests a very adequate fit of our approach and therefore the size of R in our polynomial expansions. 

Our findings suggest again the general efficacy of our approach as none of the significance tests indicate 
divergence between the true and estimated parameter values. 

2.3.3 Comparison with Monte Carlo Markov Chain Methods 

As mentioned previously, and described in detail in Appendix o one aspect of our theoretical results 
that requires study is its computational feasibility due to the large number of summands. As the exact 
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results in Theorem 12. 2l nr Theorem 12. Hl have upper sum limits at infinity, we conducted an additional small 
scale simulation to assess the efficacy of our method under the truncation approximation. To act as a 
further baseline to our approach, we also ran a Bayesian MCMC sampler (a Bayesian multinomial logit 
model with non-conjugate gamma priors as per Section|2Jl to assess both the computation accuracy for our 
approach and its accuracy per unit time compared to established extant methods. All analyses were run 
on a Pentium IV 3.3MHZ processor with 2GB of RAM. For a more accurate comparison of times we wrote 
a C program rather than a Matlab program (as in Esa for evaluating the truncated sums. 

In particular, we simulated data for / = 1000 households, = 1 or 5 observations per household, 
generated by a multinomial logit model (see @) with P = 1 covariates. Each household’s value of (3i was 
drawn from a Gamma distribution^ with 6 = 5 and n = 14. To analyze our approach, we evaluated the 
approximated marginal likelihood (marginalized over Pi) over a grid (of size 21x21) using the Diophantine 
computation savings by only looking at sums with fci -I- • • ■ -I- kNi < R for various choices of R (as compared 
to Ni sums where each went from 0 to i?, which leads to the inclusion of many summands of negligible 
size). 

When Ni = 5 the Bayesian MGMG sampler for 6000 iterations for 3 chains (0.01667 seconds per 
iteration) took about 50 seconds, where the convergence diagnostic of Gelman and Rubin (1992) indicated 
convergence after approximately 3000 iterations (hence 9000 observations available for estimation after 
burn-in). For W = 1 the Bayesian MGMG sampler for 6000 iterations for 3 chains (0.01667 seconds per 
iteration) took about 20 seconds. 

For the C program based on our truncated series expansions, the approximations depend on the parity 
of R (if R is even then the final summands all have a factor of -1-1, while if R is odd the final summands all 
have a factor of —1). Thus if the resulting values at the grid points are stable for two consecutive values 
of R, we have almost surely included enough terms in our truncation. For W = 1 there was about a 2% 
difference in values when R = 100 and 101 (about 12 seconds); there was about a .2% difference in values 
when R = 200 and 201 (about 24 seconds). These run-times compare favorably with those of the Bayesian 
MGMG sampler. For more observations per household, however, the Bayesian MGMG sampler does better. 
The problem, as shown in Appendix o is that the number of summands with fci -I- • ■ • -I- < i? is a 

polynomial in R of degree Ni. When W = 5 and R = 6 the program ran for about 40 seconds, and 
when Ni = 5 and R = 7 the run-time was about 64 seconds; while these values of R are too small to see 
convergence in the truncated series, for these data sets the series expansion is still implementable, though 
at a cost of a signihcantly greater run-time. 

Thus our series expansions, with the present computing power, are comparable to existing numerical 
methods only in the case of one observation per household, though they can still be implemented in a 
reasonable amount of time for multiple observations. 

2.4 Generalizations of the Univariate Gamma Distribution 

We describe several natural generalizations of our model. We assume for each i that /3i,i,...,are 
independent below; see Section |3 for removing this assumption as well. 

At the expense of using special functions, we may easily remove the assumption that the Pi^p are 
drawn from a Gamma distribution; however, as the research currently stands, we still must assume the 
Pi^p are drawn from one-sided distributions. In the case of just one attribute, it is straightforward to 
generalize our methods to handle Pi^p drawn from any distribution (we split the integration into three 
parts, Pi^p < ePPi^pl < e,Pi^p > e); a natural topic for future research is to handle Pi^p drawn from 
two-sided distributions with multiple attributes. 

2.4.1 Weakening Pi > e 

The assumption that > e is problematic if we desire to test the hypothesis that Pi^p = 0. To this end, 
for each Pi^p we consider instead of an e-translated Gamma distribution a 0-point mass Gamma Distribution 

^Computation time was essentially invariant over the exact values of b and n chosen. The run-time is a polynomial in R 
of degree Ni; further empirical testing is needed to ascertain how well our approach works in these settings. 
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given by 

“t“ (1 ^p')G(^Pi,p ^p: ^p')' (^^) 

In the above, 6{x) is the Dirac Delta Functional with unit mass concentrated at the origin; Wp € [0,1] is 
a weight and can be interpreted as a “weight of evidence” for Pi^p = 0. It is easier, though by no means 
necessary, to obtain closed-form integrals if we assume instead that we have 

/ I 

Wp Si^Pi^p^dPi^p “b (1 Wp^ G(^Pi^p e, bp^ fip'jdPi^p. 

i=l i=l 

That is, for a given attribute, either Pi^p = 0 for all households, or they are all drawn from a translated 
Gamma distribution. 

Note, we now have either translated Gamma distributions or delta masses. If everything were a delta 
mass, we would be left with (—1)^'^. In this case, we would not use the geometric series expansion, as the 
integration is trivial. 

The expansions are more involved if we have some delta masses and some non-delta masses (varying 
across attributes). We would have to go through the same arguments as above to estimate convergence, 
but instead of having P terms in the exponentials, we would have P — 1, P — 2, and so on. 

A stronger assumption, leading to the easiest integration, is the following: 

IP IP 

-nn 6{Pi^p)dPi^p + (1 - w) nn Gi^Pi^p e, bp^ Tip^dPi^p. 

i—lp—l z—1p—1 

That is, either everything is from a delta mass, or everything is from some translated Gamma distribution, 
with a translation of e. In this instance, our approach can be directly applied. 


2.4.2 Linear Combinations of Gamma Distributions 

We can increase the flexibility of the model by considering linear combinations of Gamma distributions: 


Wp^lG(^Pi^p 


e; 6p,i, np,i) H-h WpfiG{Pi^p - e; bp^c, np,c), 


(28) 


where 


Vp : Wp^i H-h Wpfi = 1, Wp^c G [0,1]. 


We can regard the weights as either new, additional parameters, or fixed, and becomes 



C 

^ ^ Wp^cGi^Pi^p e, bp^cf rip^c)dPi^p 

C=1 


C 


E 


Wp,e e 

(1 + bp^f.Ki,p) 


(29) 


(30) 


The essential point is that, in the above integration, Ki p does not depend on c. Thus, we will still have the 
computational savings, and need only count the solutions to the Diophantine system once. The difference 
is we now have more terms to evaluate, but we still have rapid savings, and becomes 

Theorem 2.4. Notation as in Theorem ro let the Pi^p be independently drawn from linear combinations 
of Gamma distributions as in Then P{Y\n) with 


K = 


c 

E E 

C=1 r^S(P) p=l 


n 


.-Yi,. 




{Ki(x, r, -b) - Ki{x, r, -))e 


(1 


bp^cYi^p 


V' 

o) 


(31) 
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2.4.3 More General One-Sided Distributions 

There is no a priori reason or necessity to choose f3i^p from a Gamma distribution G{(ii^p\ bp, Up) (or linear 
combinations of these). Because we were assuming the (ii^p > 0 (later, when we wanted Pi^p > e, this merely 
caused us to study translated Gamma distributions), it is natural to choose a one-sided, flexible distribution 
such as the Gamma distribution. If we take any one-sided distribution and translate, we obtain a similar 
formula as in (I23II or 12411 . The only difference would be the functional form of the non-Diophantine piece. 
The exponential decay (arising from the requirement that Pi^p > e) is still present; it came solely from the 
geometric series expansions. 

As we have not been using properties of the integration of an exponential against a Gamma distribution 
to obtain our convergence bounds, our arguments are still applicable; however, in general we don’t have 
simple closed-form expansions with elementary functions. At the cost of introducing new special functions, 
we could handle significantly more general one-sided distributions. Our integration lemma fT;emma, l2.1ll is 
trivially modified, and we still have computational savings. As we shall see in Theorem 13.II our method is 
directly applicable to any distribution (univariate or multivariate) with a closed-form moment generating 
function. 

There are two costs. The first is the introduction of new special functions in the expansions of the H^; 
however, by tabulating these functions once, subsequent evaluations can be done efhciently. The second 
difficulty is that, if one attempts to use Newton’s Method, closed-form elementary expansions of the 
derivatives are no longer available in many cases; for cases where the expansions exist, one must calculate 
the partial derivatives in a manner similar to that in Appendix IHI (for the Gamma distribution). 


3 INCORPORATING COVARIANCES: THE 
MULTIVARIATE GAMMA MODEL 


Our previous investigations have assumed that the households’ Pi^i,..., Pi^p are independent and that 
Pi^i ,..., Pi^p are independently drawn from Gamma distributions (with different parameters for each Pi^p). 
In il2.4l we have seen how to generalize to the case when the Pi^p are still independent but drawn from other 
distributions. We now discuss another generalization, namely removing the independence assumption of 
the Pi^p and thus allowing non-zero covariances. 

Let us assume that the households are still independent; however, Pi^i,... ,Pi^p are no longer assumed 
to be independent. Let us assume that for each household these are drawn from some distribution 

G(/3,,i,...,ft,p;6) = G{Pp,h), (32) 


where b is some set of parameters. Our previous work in Section [3 is the case 


b 

G{Pp,b) 


{bi,lj ■ ■ ■ ibi^p, , 

p 


n 


1 


A.i 


r(?T-^ p) y bi^p 


tli^p) 


(33) 


By using a multivariate distribution we can capture correlations between the coefficients of different brands 
(the univariate distribution of 13311 has all covariances zero), or in general the coefficients of the covariates. 
As we no longer assume that G factors into distributions for each Pi^p, is no longer valid and we 
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now must analyze, for each household i 


H,: = 


CXD poo OO 


0 ^0 
p 


E ••• E (-1)''- 

kiJN^—O 


J V' t- 
= 1 2-, t=i 




p=i 


^OO pOO 




feiii=0 fcijAr-=0 


/O ^0 


where as before 


J iVi J Ni 

^i,p — 'y ^ y ^ijt')^ijt^p-J ■ 1 — y ) ^ ( k^ji. 

j=i t=i j=i t=i 


Of course, for general G it will be difficult to evaluate El in a tractable form for numerical computa¬ 
tion. One of the advantages of our previous method is that the integral of an exponential and a gamma 
distribution was another gamma distribution, and thus the integrals which arose were simple expressions 
of the parameters. 

There are two natural ways to proceed. For a general multivariate distribution G we will be unable 
to develop a closed-form expression for the integral in El that is analogous to the one we found for the 
case of the /3i,p’s independently drawn from Gamma distributions lLemma l2.1ll . Instead we could series 
expand the remaining exponentials, recognizing the resulting integrals as the moments of the multivariate 
distribution. 

Alternatively, if G has a known closed-form expression for its moment generating function, then we may 
recognize El as simply evaluating this moment generating function at (ti,..., tp) = ..., —Ki^p). 

Unfortunately sometimes the moment generating functions only exist for suitably restricted (ti,...,tp), 
in which case we combine this approach with the series expansion for the remaining P-tuples. We present 
these details below. 


3.1 Series Expansion for P(F|f2) 

3.1.1 General Multivariate G 

For each of the P exponential terms we may expand in a geometric series. 




= E 


{-K,^p|3^,pY^ 


G=o 


f I 

<.p. 


Thus (IMIl becomes 


H. = E ■■■ E 1 - 1 )“ 

fciii—0 kijMj^—0 

CXD CXD 

= E E )-!)“ 

fcill=0 kijN.—O 


poo poo 




G{P^;b)dPi^i ■ --dPi^p 


oo poo 


g ( ( K,,pPi,pY- ^ ^ ^ 


i,...,fp=0 


E ■- E (-1)“ E 

kiii—0 kiji^.—O ii,...,£p—0 


h\ ip\ 

{-K,,iY^ ■ ■ ■ 


£i!---£p! 




(36) 


(37) 
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where 




poo pc 

Jo Jo 


l^i \ ' ’' ' ^(A,i) ■ • •) A,p; b)dl3i^i ■ ■ ■ dPi^p. (38) 

lo Jo ' 

We thus obtain a closed-form expression again, except now we have additional summations over 
£i,...,ip. Here is the (£i,... ,ip) non-centered moment of the distribution G. For a general 

distribution these may be difficult to evaluate explicitly; we need a one-sided distribution (with some 


parameters b) that is flexible in terms of shape as well as having good formulas for the moments 

Our combinatorial results from Section 12.3.11 (where we were able to re-arrange calculations to save 
computational time) depended crucially on the fact that the exponential versus gamma integrals from 
before led to simple expansions such as (1 -I- bpKi^p)~^p; these expansions did not depend on the actual 
values of kill,, kijNi but only some linear combinations (dot products). Thus we still have combinatorial 
savings in the kijt sums. 


3.1.2 Multivariate G with Closed Form Moment Generating Functions 

Let (3i = (/3i,i,..., /3i,p) be distributed according to a multivariate density G(/9i, h). The moment generating 
function of G is given by 

Mp,{ti,...,tp) = E , (39) 

where the expectation is with respect to G; i.e., 

Mp^{ti,...,tp) = [ ■■■ [ G(A,i,...,A,p;W,i---ciA,p. (40) 

'^01,1 0i,P 

Depending on the distribution, the moment generating function may exist for all P-tuples (ti,... ,tp), or 
instead only for suitably restricted P-tuples. We immediately obtain 


Theorem 3.1. Assume the moment generating function Mp.ifi,... ,tp) for the multivariate distribution 
G{Pi, b) = G(/3ip,..., Pi,p, b) exists for all {ti,... ,tp). Then P{Y\£}) = Hi, where 


= Y. ■■■ E (-l)"'Mft(-iF,,i,...,-iF,,p), (41) 

fcill=0 kiJN^=0 

with 

J Ni J Ni 

Hi,p = 'y ]'y + kijt)xijt,p, k-i = 

j=i t=i j=i t=i 


3.2 Multivariate Gamma Distributions 

We list several versions of Multivariate Gamma distributions (with non-zero covariances) that have closed- 
form expressions for their moment generating functions, and thus satisfy the conditions of Theorem 13.11 
For additional multivariate distributions see Appendix H All page and equation references in Sections 
uni and and are from Kotz, Balakrishnan and Johnson 2000. 


3.2.1 (Cheriyan and Ramabhadran’s) Bivariate Gamma (pages 432 435) 

Recall the Gamma distribution with parameter 0 > 0 is given by 


Priy) = 


\ T{9) ^e y if ?/ > 0 
0 otherwise. 


It has mean 6, variance 9, and its moment generating function is 

Mvit) = (l-^)-^ 


(43) 


(44) 
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which exists for all t < 1. Let Yi for i G {0,1,2} be independent Gamma distributed random variables 
with parameters 9i, and for i G {1, 2} set Xi =Yq + Yi. The density function of {Xi,X 2 ) is 


PXi,X2{xi,X2) = 


-(X 1 +X 2 ) 


^min(aii ,X2) 


- yoy^-\x2 - yof^-^ey°dyo, 


r(0o)r(0i)r(02) Jo 

the bivariate gamma density, equation (48.5). The correlation coefficient of Xi and X 2 is 

do 


Corr(Xi,X2) = 


y / (00 + ^l)(6'o + 6*2) 


(45) 


(46) 


As 00 > 0 (since Yo is Gamma distributed) the correlation coefficient is positive, see (48.7). The moment 
generating function is 

MxuX^itiM) = (l-ti-t2)“®“(l-ti)-®^(l-t2)"®= (47) 

and exists for all (ti,t 2 ) with G + ^2 < 1 and ti < 1, see (48.10). 


3.2.2 Multivariate Gamma Distributions 

We may generalize the arguments from Section ri.2.1l and consider the joint distribution of Xp = Xp{Yo + Yp) 
for i G {1,...,P} and Ap > 0 with Yo,... ^Yp independent Gamma distributed random variables with 
parameters 9o,...,6p. If P = 2 Ghirtis has called this the double-gamma distribution. For general P 
it is similar to Mathai and Moschopoulos’ Multivariate Gamma distribution (pages 465-470), and taking 
0p = 1 we obtain Freund’s Multivariate Exponential distribution (pages 388-391). The moment generating 
function is 


Mxi,...,Xp{ti, ■ ■ ■ ,tp) 




,Aiii(yo+yi) + ---+Aptp(Yo+Xp) 


^(AiiiH-hAptp)Yo 


...jg^gAptpYpj 


(1 _ Aiti-Aptp)-®«(1 - AiG)-^^ • • • (1 - Xptp)-^^, 


(48) 


which exists for all (ti,... ,tp) such that AiG -b • • • Xptp < 1 and each tp < Xp For our applica¬ 
tions such restrictions are harmless, as in Theorem rm we evaluate the moment generating function at 
(—Pi,i,. • ■, —Ki^p) and each Kt^p > 0. 

In fact, we may generalize even further. 

Lemma 3.2 ((Generalized) Multivariate Gamma Distribution). LetYo^i,...,Yo^MjYi,...,Yp he 
independent Gamma distributions with parameters 0o,i) • ■ • j ^o,m, 9i,... ,9p. For Xp^m, Ap > 0 let 

Xp = (Ap^iYf),! + • ■ ■ + ^p,m4o,m) + ApTp, pG{l,...,P}. (49) 

Then the moment generating function is 

M / p \ p 

Mxi,...,Xp{ti, ■ ■ ■ ,tp) = f 1 “ E ) ■ IT 

m—1 \ p—1 / p—1 

and exists for all tuples (fi,... ,tp) where X^p^i Xp^mtp < 1 for each m and tp < A“^ for each p. For r ^ s 
the covariances are 


M 

Govar(Xp, Ais) = E (51) 

m—1 
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and the correlation coefficients are 


Xr,m^s,m ^0,r] 


Corr(Xr,Xs) = , 

y ^r,1^0,1 + • • • + ^r,M^O,M + ^ ^^ 00,1 + ' ’ ' + X'^ j^Oq^m + X'^Y^ 

Proof. The moment generating function is 


(52) 


= E 
= E 


gEp=i('^P,iT),iH-hAp ,M ^0,M+Apyp)ip 


fl2p=i ^p,i^p^o,i 


•E 


2,5Zp=l ^p,M^p^0,M 


.E[e^i*i^i] ..-E 


M / P 

— I 1 ^ ^ ^p,mtp 

m—1 \ p—1 


J^(l Xptp) 

p=i 


(53) 


which exists for tuples (ti,... ,tp) where X)p=i Xp^mtp < 1 for each m and tp < A“^ for each p. For our 
applications such restrictions are harmless, as in Theorem 13.II we evaluate the moment generating function 
at , —Ki^p) and each Ki^p > 0. The covariances and correlation coefficients are easily determined 

in this case. As the To,m and Yp are independent, for r ^ s 

Covar(Xp, A^) = E [(Ap^iTo,! + ■ ■ ■ + Xt^mYq^m + ApAr)(As,iFo4 + ■ ■ ■ + As,mAo,m + A^A^,)] 

= — E [Ap^iTo,! + • ■ • + Xt^mYq^m + XrYr] ■ E [AspTo,l + • ■ • + Xs^mYq^m + AsAs] 

= E [(AppFo.i + ■ ■ ■ + Ap_mAo,m)(As,1^0,1 + • ■ • + Xs^mYq^m)] 

= — E [AppTo,! + • ■ • + Ap^mAq.m] • E [As_ilo,i + • • • + Xs^mYq^m] 

M M 


EE Xr,uXe,v (E[yo,iiAo,p] — E[yo,u] ' E[yo,'!;]) 

u—1 

M 

^ ^ ^r,rn^s,rn^^^{^0,rn) 
m—1 
M 


and the correlation coefficient follows immediately. 


(54) 


□ 


Finally, to obtain an even more flexible distribution, we may consider linear combinations of multivariate 
Gamma functions. The methods of Section are immediately applicable and yield an extension of 
Theorem ED 


3.3 Computational Savings for Multivariate Distributions 

For a general multivariate distribution G as in EEH the efficiency of our expansion is related to the rate of 
growth of the moments, which determines the number of terms needed in the series expansions. However, 
if G has a good closed-form expansion for its moment generating function (as in ina, then substantial 
computational savings exist. We study the computational savings for such G below; we may take G to be 
the bivariate gamma distribution with MGF given by 113, or the multivariate generalizations of UHl or 

(Pl . 

Our assumptions on the moment generating function of G imply the conditions for Theorem 13.11 are 
satisfied. Thus we obtain a closed-form series expansion for Hp. 

OO OO 

= E ••• E (-l)"'Mft(-^*.i,---,-^..p), (55) 

kiii—0 kijN.i—0 
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where as always 


J Ni J Ni 

^ ^ ^ ^ iy^ji kiji^Xijt^p^ k • \ — ^ ^ ^ ^ kijt- (^1^) 

i=it=i j=i t=i 

Note again that Hi depends weakly on fc; all that matters are the dot products k ■ Xi^p and the parity of 

(—1)*^'^. We argue as in Theorem l2.dl Let r = (ri,..., rp) be a P-tuple of non-negative integers. For each 

k we count the number of solutions to the system of Diophantine equations k ■ Xi^p = Xp {p G {1,..., P}) 
while recording the sign of (—= (—1)*'^. Then we have 

Theorem 3.3. Set 

S{M) = {v :v = {vi,...,vm),vi G {0,1,2,...}} 

Ki{x,r,+) = #{k G S{JNi) -.yp G [1,... ,P},k ■ Xi^p = rp,{-l)^'^ =+1} 

Ki{x,r,-) = ^{kGS{JNi)-.ypG{l,...,P},k-Xi^p = rp,{-lf'^ = -!}. (57) 

Assume the jSi^p are drawn from a one-sided multivariate distribution with parameters b and moment gen¬ 
erating function Mg.{ti,... ,tp) defined when each tp < 0. Then P(y|r2) = Hti with 

H, = {Mx,r,+) - K,ix,r,-)) ■ Mg,i-K,^i,...,-K,^p), (58) 

r6S(P) 

and the combinatorial and Diophantine estimates and bounds from AmendixT^ are still applicable, leading 
again to enormous computational savings (after an initial one time cost of determining the Ki(x,r,P.)). 

To gain additional savings in Theorem l3 . 31 we may replace the multivariate distribution with a translated 
one as in Section 12.3.11 

Further (at least if we use the multivariate distributions from il3.2ll , 77^ is a sum of the moment generating 
function, and the moment generating function is readily differentiable in terms of its parameters. Thus we 
again obtain closed-form expressions for the derivatives (see il2.3.1l a,nd Appendixand thus for certain 
data sets (where now the parameters may be correlated) there is the possibility of using Newton’s Method 
to determine the optimal values. 

Probably the most tractable and useful multivariate density will be the multivariate gamma distribution 
from Lemma E2 While all covariances will be non-negative, the moment generating function, covariances 
and correlation coefficients are given by very simple formulas, and are easily evaluated and easily differ¬ 
entiated. Moreover the multivariate gamma distribution can take on a variety of shapes, and as discussed 
in a we may further increase the admissible shapes by considering linear combinations of multivariate 
gamma distributions. 


4 CONCLUSION 


In this research we obtain closed-form expansions for the marginalization of the logit likelihood, allowing 
us to make direct inferences about the population. In general these expansions involve new special functions; 
however, in the case where the distribution of heterogeneity follows a Gamma or Multivariate Gamma 
distribution (or, in full generality, any linear combination of multivariate distributions with a closed-form 
moment generating function defined for all non-positive inputs), by re-grouping the terms in the expansions 
we obtain a rapidly converging series expansion of elementary functions. We separate the calculations into 
two pieces. The first piece is counting solutions to a system of Diophantine equations (we are finding 
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non-negative integer solutions k to k ■ Xi^p = Vp] these are linear equations with integer coefficients); the 
second is evaluating certain integrations, which depend only on and the values of the Diophantine sums. 

The advantage of this approach is clear - we need only do the first calculations once. Thus, if we have 
10® or so operations there, it is a one-time cost. When we need to evaluate the functions at related points 
(say for the Newton’s Method maximization or at the grid points), we need only evaluate the summations 
on r = (n,..., Tp) in if^ or (1^ . This grouping of terms is an enormous savings; we count the 

solutions to these systems of equations once, and save the results as expansion coefficients. 

While this research has focused on one specific case, the logit model, and two specific set of priors, the 
Gamma (if the response coefficients are independent) and Multivariate Gamma (if there may be correlations 
among the response coefficients) distributions, our hope is that this research spurs others to consider 
deriving closed-form solutions via expansions that can be made arbitrarily close. In fact, closed-form 
expansions exist for any multivariate distribution that has a closed-form moment generating function. Thus 
our expansions can incorporate correlations among the coefficients without sacrificing the computational 
gains. 

As experience with pure simulation approaches shows, i.e. those that are alternatives to that considered 
here, it is never a bad thing to have an approach that can be used to explore the parameter space (e.g. 
mode finding) in advance of running a simulation routine. Whether it is to get good starting values, or 
simply to understand the potentially multimodal nature of a posterior surface, we hope that research such 
as this provides value to researchers doing applied problems. 


A GAMMA FAMILY OF DISTRIBUTIONS 


Given the positivity restriction described in Section fl.2l for the we desired a family of distributions 
defined on the positive real line that would be extremely flexible, allowing for a variety of shapes of the 
heterogeneity distribution; and, of course, be conjugate to the geometric series expansion to the logit model. 
The Generalized Gamma family of distributions satisfies those requirements. As this work concentrated 
on the Gamma distribution, we only describe this case below. 

A.l Plots 

We give a few plots of the Gamma distribution to illustrate the richness of the family. 



G(z;l,2), G(z;1.5,3), G(z;2,4). 


While we develop the theory for (3i^p drawn from a Gamma distribution, we could use a weighted sum of 
Gamma distributions as well. 
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Sum of Weighted Gamma distributions: ^ • G{z; 1,2) + ^- G{z; 1,5): 


A.2 Integration Lemma 

We prove Lemma o 
Proof. We have 


e-^^G(z;5,n) = 


^ ( 1 ) 

bT{n) \bJ 

1 /ZN'*-! 

\b) 


bT{n) 

{l + bdy 


g b/{i+bd) 

1 


1+bd 


r(n) \b/{l + bd) 


gb/ci+bd) 


(l + 6d)-”G z, 


1 + bd' 


(59) 


As b > 0 and d > 0, 1 + 6d > 0, and the above is well defined. Note G(z; is another Gamma 

distribution and therefore integrates to 1. □ 


B MULTIVARIATE DENSITIES WITH CLOSED FORM 
MOMENT GENERATING FUNGTIONS 


In addition to the Multivariate Gamma distribution discussed in detail in Section |21 we describe two 
additional multivariate distributions that have closed-form expressions for their moment generating func¬ 
tions. As such, these distributions satisfy the conditions of Theorem l3.ll and thus lead to closed-form series 
expansions. All page references and equation numbers are from Kotz, Balakrishnan and Johnson 2000. 
By no means is this list exhaustive, but rather representative of those multivariate distributions which are 
well suited to our needs. Other distributions are Moran-Downton’s Bivariate Exponential (pages 371-377, 
especially (47.75) and (47.76)), Freund’s Multivariate Exponential (pages 388-391, especially (47.85)), 
Kibble-Moran’s Bivariate Gamma (pages 436-437), Farlie-Gumble-Morgenstern Type Bivariate Gamma 
(pages 441-442, especially (48.19) and (48.20)), and Mathai-Moschopoulos’ Multivariate Gamma (pages 
465-470, especially (48.61) and (48.62)). Other interesting distributions include truncated multivariate 
normal distributions; however, as we require one-sided distribution these are not as useful as those related 
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to the Gamma distributions. 

B.l (Arnold and Strauss’s) Bivariate Exponential (pages 370—371) 


Consider the joint probability density 
PXi,X2 {xi,X2) 


^12e-^i2^i®2-Aia:i-A2a:2 jf xi,X2 > 0 

0 otherwise, 


( 60 ) 


where Ai, A 2 , A 12 > 0 and a 1 i 2 is the normalization constant. The moment generating function is given by 


^00 pOO 


Ai2 

Ai2 


0 JQ 


^ — \l2XlX‘2 — \lXx—\2X2+tiXi-\-t2X2 


dxidx2 


^ — {X2—t2)x2 


/O 


■ nOO 

/ g-(Ai2a;2 + Ai-ti)xi^^^ 

Jo 


dxo 


Ai 2 / e 

^ r 

Ai 2 Jo 

A 

^12 / 


-{>^2-t2)x2 . 


dX2 


-(>~2-t2)x2 


^ 12 X 2 + Al — tl 
dX2 

X2 + (Ai — ti)Aj^2^ 
du 


Ai2 Jo W + (Ai — tl)(A2 — t2)Xi2 
An g-(Ai-ii)(A2-t2)/Ai2 Ej ( (Al -ti)(A2 -^ 2 ) 


^12 


\12 


where 


17V A r -tdt 

E:(.) = -J_^e - 


(61) 


(62) 


is the exponential integral function (the principal value is taken). The moment generating function exists 
for tp < Xp. For our applications such restrictions are harmless, as in Theorem rm we evaluate the 
moment generating function at ..., —Ki^p) and each Ki^p > 0. The normalization constant can be 

determined by setting ti = <2 = 0 : 


Ai2 = Ei 


(63) 


B.2 (Freund’s) Bivariate Exponential (pages 355—356) 

Freund considered the following situation: a two component instrument has components with lifetimes 
having independent density functions (when both are operating) of 


PXp 


Up e if Xp > 0 

0 otherwise. 


(64) 


where ap > 0 ; however, when one component fails the parameter of the life distribution of the other changes 
to a'f,. Thus Xi and X 2 are dependent with joint density function 


f if 0 < xi < X2 

if 0 < X 2 < xi, 

where 7 p = oi + 02 — V; see (47.25). If 7 p 7 ^ 0 then the marginal density of Xp is 


PX^Xp) = 


{ap - ap){ai + 02) , «p«3-p 


7p 


+ 


7p 


Xp > 0 . 


(65) 


( 66 ) 
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As these are mixtures of exponentials, this distribution is also called the bivariate mixture exponential. 
The moment generating function is given by 




1 


oil + a2 — ti — t2 


a[a2 

a'l - ti 


aia2 \ 

a'2-t2) ’ 


(67) 


which converges for tp < o^and ti +12 < ai + a 2 ] see (47.28). For our applications such restrictions are 
harmless, as in Theorem 18.1 1 we evaluate the moment generating function at (—AT^p,..., —Ki^p) and each 
Ki^p > 0. The correlation coefficient is given by 


corr(Ai, A 2 ) 


a'^a2““iQ^2 ^ f 1 

y/{a'i + 2aia2 + + 2aiQ;2 + af) V 3’ / ’ 


( 68 ) 


see (47.31). Thus unlike many of the other multivariate distributions, this model allows us to study 
one-sided distributions with negative correlation. 


C COMBINATORIAL AND DIOPHANTINE BOUNDS 


We use the notation of Theorem 12.31 and Theorem 12.41 


S{M) 

II 

II 

,vm),vi G {0,1,2,3,...}} 


Ki{x,r) 

= #{k G S{JN,) 

VpG {!,.. 

II 


Ki{x,r,-[-) 

= #{k G S{JN,) 

VpG {!,.. 

. , P}, fc • Xi,p = Tp, (-1)^'^ = -kl} 


K,{x,r,-) 

= #{k G SiJN,) 

VpG {!,.. 

., P}, fc • x*,p = rp, (-1)'"'^ = -1}, 

(69) 


and let ICi(r) = Ki(T,r). 

For each i we bound the number of solutions to k-Xi^p = Tp for p G {1,..., P}. Solutions to Diophantine 
equations of this nature often crucially depend upon the coefficients Xijt^p- In expanding P(Y\/3) we can 
trivially handle any terms with an xijt^p = 0. Thus, as we assume xijt^p is integral, in all arguments below 
we may assume Xijt^p > 1; if this assumption fails than trivial book-keeping in our earlier expansions 
remove the sum over kijt- The following result is immediate: 

Lemma C.l. Let Xi^p = {xm^p ,..., XijNi,p) be a J-Ni tuple of positive integers. Then Ki{x, r, ±) < Ki{r). 

Thus by Lemma RTT1 instead of analyzing Ki{x,r,±l) it suffices to bound the simpler Ki{r). 

For ease of exposition, we confine ourselves to the case where the (3i^p are drawn from a translated 
Gamma distribution, G{z — e;bp,np), and we assume Xijt^p > S; for example, we may take (5=1. Such 
bounds do not exploit the cancellation in Ki{x, r, -I-) — Ki{x, r, —) (though it is not unreasonable to expect 
square-root cancellation). It is straightforward to generalize these arguments to the Multivariate Gamma 
distribution (or linear combinations thereof) from Lemma, 13.21 

Central in the arguments below are combinatorial results about counting the number of representations 
of an integer as a sum of a hxed number of integers. We briefly recall two useful results. 

Lemma C.2. The number of ways to write a non-negative integer r as a sum of P non-negative integers 
^s rp^_f^). 

Sketch of the proof. Consider r -I- P — 1 objects in a row. Choosing P — 1 objects partitions the remaining 
r objects into P non-negative sets, and there are ways to choose P — 1 objects from r + P — 1 

objects. □ 
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Lemma C.3. The number of ways to write a non-negative integer at most R as a sum of P non-negative 
integers is = i^p)- 

Sketch of the proof. Partition R into P + 1 sets as in Lemma Ea As the last partition runs through all 
numbers from 0 to i? we get partitions of all numbers at most R into P non-negative sets. □ 


To exploit the exponential decay in (1^ from the Pi^p being drawn from translated Gamma distributions, 
we must show that Ki(r) does not grow too rapidly; we shall show it grows at most polynomially in r. Note 
such arguments ignore the decay of the (1 -I- bpYi^p -\- bprp)~'^^ factors. Assume we truncate our expansion 
by requiring 0 < km -I- • • ■ -I- fcijWi < R- As we assume that Xijt,p > b and that we are using translated 
Gamma distributions, we must bound 


E n- 

- 


j E/=1 E t=l 


(70) 


We use the notation of Section Ti. 3. II For any r, if each kijt > 0, then Lemmas IG . II and IG . 21 immediately 
yield 


Theorem C.4. We have 

K,{r) = #{k:km + --- + hjN,=r} = (71) 

Thus Ki{r) < (r -|- JNi — l)‘^'^‘“^/(JiVi — 1)!, which implies that Ki{r) grows at most polynomially. If 
^ijt,p P 1 then Ki{x,r,±) grows at most polynomially. 


We conclude with some arguments and techniques that are specihc to having the exponential decay 
from the translated Gamma distributions. These exploit improved bounds for summing Ki(r) for r in 
various ranges. We bound 


E 


p 


p—1 

fcjllH- 


By Lemma iGAl we have 


OO 


E 

r=P+l 


fr + JNi - 1 \ eSPr 

I JiVz-1 J 


ff{k : 0 < km + • ■ ■ kijp;^ < i?} 


fR + JNf\ 

V JN^ )' 


(72) 


(73) 


Remark C.5. The number of k-tuples with 'pj'pfkijt < R is If we want the approximation 

from looking at just these terms to be good, we need the sum in to be small. In this case, we initially 
need to evaluate terms, which leads to R values to store. In subseguent evaluations (note this 

encompasses not only calculating Hi but possibly also its partial derivatives required for Newton’s Method) 
we only have to read in R values, an enormous savings. The more varied the data Xijt,p is, however, the 
more tuples of dot products must be stored. 


To obtain a feel for these sizes, we tabulate the number of terms arising from different values of R, J 
and Ni. Note it is the product JNi that matters, not the values of J and Ni separately. 


R 

JNi 

/R+JNi-l\ 

V JN.i-1 } 

5 

20 


7 

20 

105.8 

9 

20 

106.8 

5 

30 

10^ 

7 

30 

106.9 

9 

30 

108.2 

5 

40 

liF’ 

7 

40 

10"" 

9 

40 

109.2 
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The largest term in the expansion of Hi in Theorem 12.21 is when all kijt = 0, giving +1. When the fc-sum 
is small (say of size s), we find terms of size We have the following trivial estimate: 


/r + JW - A 
V JN,-l ) 


< (1 + 


Assume eSP > log 2. Then the sum in m is bounded by 


2JNi-l^rlog2 


(74) 


E 2 

r—R+l 


JNi — l^r log 2^ — eSPr 


2JNi-l / g-(£5P-log2)r^^ 


R 


2 JA^i —Ig —(e(5P—log2) logP 

eSP — log 2 


(75) 


If {eSP — log 2) log i? > JNilog2, the above is small. Unfortunately, it might not be small compared to 
the contributions from terms with a small fc-sum (of size s); those contribute on the order of 

We perform a more delicate analysis by using dyadic decomposition, breaking the sum over r > i? + 1 
into blocks such as 2™i? < r < 2™+^i?, and using Lemma El in each block. As the choice function 
monotonically increasing in r, we find 


OO 


E 

r—R-\-l 


(r + JNi — 1\ e6Pr 

1 JN,-1 J 


< 


< 


< 


OO 


E E 

m—O 

CO 

E 


m—O r—2^R 

CO 


r + JNi — l'\ 


m—0 

CO 


2™+ii? + JiVA _,5P2" 
JN,, 


R 


^JNi 2’^+^ Rlog2 -e5P2'^R 


“‘“s-g 

m—0 

2 g —((e^P—2 log 2)P—JA^i log 2) 


(76) 


This is small if {eSP—2 log 2)R > JNi log 2, allowing us to replace the log R in (ei5P—log 2) log R > JNi log 2 
with R. 

A slightly better savings is attainable by using instead 


/r + JA^i-A _ f2^+^R + JNA _f2^R-l + JNA 

J - V JN. ) \ JN. J 

and using polynomial (rather than exponential) bounds. The main term is bounded by 

(2™+ip + ^ f (2JN.Y^y{JNi)\ if 2^+^R < JN. 

{JN.)\ j(2™+2p)‘^^7(JiVi)! if 2"*+!^ > JiVi 


(77) 


(78) 


In order to deduce which of the many possible expansions is best, and what size data sets are manageable, 
one needs to have explicit values for e, 5 and P; one can also try to exploit the cancellation from the (—I)*'^ 
and the denominator factors. 


D APPLYING NEWTON’S METHOD TO THE MARGINAL 

POSTERIOR 

Newton’s Method yields a sequence of points Xk such that f{xk) converges to a local maximum of /. 
If gk and Hk are the gradient and Hessian of / at Xk, then Xk+i = Xk +Pfe, where pk satisfies the linear 
equation HkPk = -gk- 
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For our problem, Q = (&, n). As the function we want to maximize is a product of terms, we maximize 
log/(6, n), as the logarithm converts the product in 11 fill to a sum. To maximize 


log/(&,n) = log]^H,(&,n) = ^logH,(&,n) 


(79) 


we need the gradient and the Hessian as in standard applications of Newton’s method. The gradient is 

V/(6,n) '^VHi(6,n) 


Vlog/(6,fi) = 


fib, n) 


= E 


Hi(&, n) 


(80) 


and the entries of the Hessian are 


where or ^ 

OX aOu ox oriT} 


— Vlog/(6,n) = ^ 


^VH,(&,n) V 

Hi(6, n) 

■ ^H,(6,n) 

ili(b,n) 

H2(&,n) 


(81) 


Straightforward differentiation gives the partial derivatives. The advantage of using a Gamma dis¬ 
tribution is the ease of differentiating and evaluating these partials. We give exact, infinite expansions; 
in practice, one truncates these expressions, and the same Diophantine calculations and computational 
savings for H^ also hold for these derivatives. Let 


B{b,fi,K{i)) = Y[{l-\-bpK^^p) 

p^i 

k ■ 1 = kill -f • ■ ■ kijisf-. 

Lemma D.l (First Derivative Expansions). 

dHi{b,n) 


(82) 


dbv 


= - E E (-1) 

fciii=0 kijN =0 


fc-i Kj^pUp 
1 -|- bpKi. 


B{b, n,K{i)) 


dH^{b,n) 

dUr, 


= - E ••• E (-l)"'log(l + 6p77*,p)-B(M,iL(z)). 

kiii—0 kijNj—0 


(83) 


As bp, Ki^p are non-negative, the logarithms are well defined above. 
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Lemma D.2 (Second Derivative Expansions). In the expansions below, p^q. 

r n„(l + rir,) 


d'^Hi{b, ft) 
962 

OO 

= E •' 

OO 

E (-1) 


fciii=0 

kiJNi—0 

d'^Hi{b,n) 

dn^ 

CXI 

= E •' 

OO 

E (-1) 

p 

kill —0 

kiJNi—0 

dupdbp 

OO 

= E •' 

OO 

E (-1) 


kiii—0 

kiJN^—O 

d'^Hi{b,n) 

dbpduq 

OO 

= E •' 

OO 

E (-1) 


kill —0 

kiJNi—0 

d'^Hi{b,n) 

dbpdbq 

OO 

= E •' 

OO 

E (-1) 


fetii=0 

kiJN^ =0 


OO 

= E •' 

fetii=0 

OO 

E (-1) 

kijNi=0 

dUpdUq 




1 “ 1 “ bpKi^p 


K, 


1 + bpKi^p 

k-iKi,pnp log(l + bgK{i, q)) 


1 + bpK, 


' X B{b,n,K{i)) 
B{b,n,K{i)) 




■ 1 Kj^pUp K{i,q)nq 
1 + bpKi^p 1 + bqK{i, q) 


B{b,n,K{i)). 


(84) 
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